Sea ice forecasting using IceNet
Sea ice forecasting using IceNet#
:opticon:`tag` :badge:`Polar,badge-primary` :badge:`Modelling,badge-secondary`
Purpose#
Learning objectives
- Learn about sea-ice forecasting using artificial intelligence
- Fight common misconceptions on sea-ice
- Learn about IceNet, a deep learning sea-ice forecasting system
- Understand the concept of leadtime when forecasting sea-ice
- Run IceNet to forecast sea-ice concentration in 2020 using different leadtime
Abstract#
Use IceNet, a deep learning sea ice forecasting system trained using climate simulations and observational data to teach students about sea-ice forecasts and machine learning. This notebook also discusses some common misconceptions about icebergs.
We will reuse the original Jupyter Notebook developed by Alejandro Coca-Castro available in Rohub:
Rationale#
The USCG Healy (WAGB-20) breaks ice around the Russian-flagged tanker Renda 250 miles south of Nome, Alaska.
The rapid reduction of Arctic sea-ice is known to be linked to global climate change. Potential new routes are expected to be opened and transit seasons will start much earlier and end much later.
Being able to accurately forecast sea-ice concentration is important for ensuring the safety of vessels in the Arctic. It is also paramount to optimize fuel usage and to prevent standing-by by synchronizing the arrival of the vessels when the routes are navigable.
Photo by shawnanggg on Unsplash
The rapid retreat and altered dynamics of Arctic sea ice has long been a symbol of global climate change. As summer sea ice transitions from largely Multi-Year Ice (MYI) to mostly First-Year Ice (FYI) with longer open water seasons, new routes are expected to become more accessible and transit seasons are expected to begin earlier and end later. While there is wide agreement on the long-term trend in sea ice retreat, there are many uncertainties in medium term projections of sea ice extent and dynamics, as well as in short term forecasts for navigation. The rates and variability of retreat, particularly on seasonal time scales (Boeke and Taylor 2016) are subject to much disagreement among climate models. Further- more, the Svalbard and Barents Sea sector of the Arctic shows a large variability in seasonal sea ice extent (Figure 1), therefore places greater demand on conveying a more accurate representation in sea ice models and forecasts. Moreover, about 80% of all Arctic shipping crosses Norwegian waters (St. Meld. 31 2015-2016).
Common misconceptions about icebergs#
- Sea ice floats on salt water, but it does not contain salt.
- Melting icebergs will NOT cause sea level to rise (that is the Archimedes principle), however it is the melting of land-based ice (such as glaciers) that will increase sea level.
from IPython.display import HTML
# Youtube
HTML('<iframe width="560" height="315" src="https://www.youtube.com/embed/Kp5OqedXlGA?rel=0&controls=0&showinfo=0" frameborder="0" allowfullscreen></iframe>')
/opt/conda/lib/python3.8/site-packages/IPython/core/display.py:724: UserWarning: Consider using IPython.display.IFrame instead
warnings.warn("Consider using IPython.display.IFrame instead")
Impact of sea-ice on biodiversity#
- Sea-ice loss in the Arctic has an impact on biodiversity (less salinity, increasing light penetration) and also for instance on the hunting grouds of sea and terrestrial mammals. It can trigger earlier growing season, an increase in ice algae and phytoplankton biomass.
- There are many unknown but disruptive changes are expected.
Scientific approach#
The paragraph above explained the importance of sea-ice on the climate and our day to day life. Transformations linked to climate change will impact local populations too. The reduction of sea-ice in the Arctic is often seen as an opportunity for openening new routes for shipping, but also for fishing and exploration. However, as mentioned above, the impact of the reduction of sea-ice also has significant consequences on biodiversity.
Below, we will focus on learning about using machine learning to forecast sea-ice concentration. Optimising and securing arctic shipping corridors require accurate sea-ice forecasts. Students will learn about the method used to accurately forecast sea-ice concentration.
Modelling approach#
IceNet is a probabilistic, deep learning sea ice forecasting system. The model, an ensemble of U-Net networks, learns how sea ice changes from climate simulations and observational data to forecast up to 6 months of monthly-averaged sea ice concentration maps at 25 km resolution. IceNet advances the range of accurate sea ice forecasts, outperforming a state-of-the-art dynamical model in seasonal forecasts of summer sea ice, particularly for extreme sea ice events. IceNet was implemented in Python 3.7 using TensorFlow v2.2.0. Further details can be found in the Nature Communications paper Seasonal Arctic sea ice forecasting with probabilistic deep learning.
Highlights#
Clone and access IceNet’s codebase to produce seasonal Arctic sea ice forecasts using 3 out of 25 five pre-trained IceNet models downloaded from the Polar Data Centre.
Forecast a single year, 2020, using IceNet’s preprocessed environmental input data downloaded from a Zenodo repository.
Visualise IceNet’s seasonal ice edge predictions at 4- to 1-month lead times.
Interactive plots comparing IceNet predictions against ECMWF SEAS5 physics-based sea ice concentration and a linear trend statistical benchmark.
Contributions#
Forked Notebook#
Original Notebook#
Alejandro Coca-Castro (author), The Alan Turing Institute, @acocac
Tom R. Andersson (reviewer), British Antarctic Survey, @tom-andersson
Nick Barlow (reviewer), The Alan Turing Institute, @nbarlowATI
Modelling codebase#
Tom R. Andersson (author), British Antarctic Survey, @tom-andersson
James Byrne (contributor), British Antarctic Survey, @JimCircadian
Tony Phillips (contributor), British Antarctic Survey
Modelling publications#
Modelling funding#
The IceNet project was supported by Wave 1 of The UKRI Strategic Priorities Fund under the EPSRC Grant EP/T001569/1, particularly the AI for Science’ theme within that grant and The Alan Turing Institute.
Note
The notebook contributors acknowledge the IceNet developers for providing a fully reproducible and public code available at https://github.com/tom-andersson/icenet-paper. Some snippets from IceNet’s source code were adapted to this notebook.
Set up path for libraries#
# system
import os
import sys
sys.path.insert(0, os.path.join(os.getcwd(), 'polar-modelling-icenet', 'icenet'))
Clone the IceNet GitHub repo#
if not os.path.isdir(os.path.join(os.getcwd(), 'polar-modelling-icenet')):
!git clone -q https://github.com/tom-andersson/icenet-paper.git polar-modelling-icenet
Load libraries#
pip install scitools-iris
Collecting scitools-iris
Using cached scitools_iris-3.4.0-py3-none-any.whl (2.5 MB)
Requirement already satisfied: matplotlib>=3.5 in /opt/conda/lib/python3.8/site-packages (from scitools-iris) (3.6.1)
Requirement already satisfied: dask[array]>=2.26 in /opt/conda/lib/python3.8/site-packages (from scitools-iris) (2021.4.1)
Collecting cf-units>=3.1
Using cached cf_units-3.1.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (544 kB)
Collecting cartopy>=0.21
Using cached Cartopy-0.21.0-cp38-cp38-linux_x86_64.whl
Requirement already satisfied: cftime>=1.5.0 in /opt/conda/lib/python3.8/site-packages (from scitools-iris) (1.6.2)
Requirement already satisfied: shapely!=1.8.3 in /opt/conda/lib/python3.8/site-packages (from scitools-iris) (1.8.0)
Requirement already satisfied: scipy in /opt/conda/lib/python3.8/site-packages (from scitools-iris) (1.6.3)
Collecting xxhash
Using cached xxhash-3.1.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (212 kB)
Requirement already satisfied: netcdf4<1.6.1 in /opt/conda/lib/python3.8/site-packages (from scitools-iris) (1.5.7)
Requirement already satisfied: numpy>=1.19 in /opt/conda/lib/python3.8/site-packages (from scitools-iris) (1.23.4)
Requirement already satisfied: pyproj>=3.0.0 in /opt/conda/lib/python3.8/site-packages (from cartopy>=0.21->scitools-iris) (3.3.1)
Requirement already satisfied: pyshp>=2.1 in /opt/conda/lib/python3.8/site-packages (from cartopy>=0.21->scitools-iris) (2.3.1)
Requirement already satisfied: jinja2 in /opt/conda/lib/python3.8/site-packages (from cf-units>=3.1->scitools-iris) (3.0.3)
Collecting antlr4-python3-runtime==4.7.2
Using cached antlr4_python3_runtime-4.7.2-py3-none-any.whl
Requirement already satisfied: toolz>=0.8.2 in /opt/conda/lib/python3.8/site-packages (from dask[array]>=2.26->scitools-iris) (0.11.1)
Requirement already satisfied: fsspec>=0.6.0 in /opt/conda/lib/python3.8/site-packages (from dask[array]>=2.26->scitools-iris) (2021.4.0)
Requirement already satisfied: cloudpickle>=1.1.1 in /opt/conda/lib/python3.8/site-packages (from dask[array]>=2.26->scitools-iris) (1.6.0)
Requirement already satisfied: partd>=0.3.10 in /opt/conda/lib/python3.8/site-packages (from dask[array]>=2.26->scitools-iris) (1.2.0)
Requirement already satisfied: pyyaml in /opt/conda/lib/python3.8/site-packages (from dask[array]>=2.26->scitools-iris) (5.4.1)
Requirement already satisfied: pillow>=6.2.0 in /opt/conda/lib/python3.8/site-packages (from matplotlib>=3.5->scitools-iris) (8.1.2)
Requirement already satisfied: python-dateutil>=2.7 in /opt/conda/lib/python3.8/site-packages (from matplotlib>=3.5->scitools-iris) (2.8.1)
Requirement already satisfied: fonttools>=4.22.0 in /opt/conda/lib/python3.8/site-packages (from matplotlib>=3.5->scitools-iris) (4.37.4)
Requirement already satisfied: pyparsing>=2.2.1 in /opt/conda/lib/python3.8/site-packages (from matplotlib>=3.5->scitools-iris) (2.4.7)
Requirement already satisfied: packaging>=20.0 in /opt/conda/lib/python3.8/site-packages (from matplotlib>=3.5->scitools-iris) (21.3)
Requirement already satisfied: kiwisolver>=1.0.1 in /opt/conda/lib/python3.8/site-packages (from matplotlib>=3.5->scitools-iris) (1.3.1)
Requirement already satisfied: contourpy>=1.0.1 in /opt/conda/lib/python3.8/site-packages (from matplotlib>=3.5->scitools-iris) (1.0.5)
Requirement already satisfied: cycler>=0.10 in /opt/conda/lib/python3.8/site-packages (from matplotlib>=3.5->scitools-iris) (0.10.0)
Requirement already satisfied: six in /opt/conda/lib/python3.8/site-packages (from cycler>=0.10->matplotlib>=3.5->scitools-iris) (1.15.0)
Requirement already satisfied: locket in /opt/conda/lib/python3.8/site-packages (from partd>=0.3.10->dask[array]>=2.26->scitools-iris) (0.2.0)
Requirement already satisfied: certifi in /opt/conda/lib/python3.8/site-packages (from pyproj>=3.0.0->cartopy>=0.21->scitools-iris) (2022.9.24)
Requirement already satisfied: MarkupSafe>=2.0 in /opt/conda/lib/python3.8/site-packages (from jinja2->cf-units>=3.1->scitools-iris) (2.1.1)
Collecting numpy>=1.19
Using cached numpy-1.22.4-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (16.9 MB)
Installing collected packages: numpy, antlr4-python3-runtime, xxhash, cf-units, cartopy, scitools-iris
Attempting uninstall: numpy
Found existing installation: numpy 1.23.4
Uninstalling numpy-1.23.4:
Successfully uninstalled numpy-1.23.4
Attempting uninstall: cartopy
Found existing installation: Cartopy 0.20.0
Uninstalling Cartopy-0.20.0:
Successfully uninstalled Cartopy-0.20.0
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
eo-learn-mask 1.3.0 requires opencv-contrib-python-headless, which is not installed.
tensorflow 2.4.3 requires absl-py~=0.10, but you have absl-py 1.3.0 which is incompatible.
tensorflow 2.4.3 requires termcolor~=1.1.0, but you have termcolor 2.0.1 which is incompatible.
tensorflow 2.4.3 requires wrapt~=1.12.1, but you have wrapt 1.14.1 which is incompatible.
sentinelhub 3.4.4 requires pillow<=8.4.0,>=8.3.2, but you have pillow 8.1.2 which is incompatible.
eo-learn-features 1.3.0 requires pillow>=9.1.0, but you have pillow 8.1.2 which is incompatible.
eo-learn-features 1.3.0 requires scikit-image>=0.19.0, but you have scikit-image 0.18.1 which is incompatible.
Successfully installed antlr4-python3-runtime-4.7.2 cartopy-0.21.0 cf-units-3.1.1 numpy-1.22.4 scitools-iris-3.4.0 xxhash-3.1.0
Note: you may need to restart the kernel to use updated packages.
!wget -O polar-modelling-icenet/icenet/config.py https://raw.githubusercontent.com/annefou/polar-modelling-icenet/main/config.py
print("patch applied")
--2022-12-05 07:55:10-- https://raw.githubusercontent.com/annefou/polar-modelling-icenet/main/config.py
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.110.133, 185.199.108.133, 185.199.109.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.110.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 3431 (3.4K) [text/plain]
Saving to: ‘polar-modelling-icenet/icenet/config.py’
polar-modelling-ice 100%[===================>] 3.35K --.-KB/s in 0s
2022-12-05 07:55:11 (28.5 MB/s) - ‘polar-modelling-icenet/icenet/config.py’ saved [3431/3431]
patch applied
# data
import json
import pandas as pd
import numpy as np
import xarray as xr
# custom functions from the icenet repo
from utils import IceNetDataLoader, create_results_dataset_index, arr_to_ice_edge_arr
# modelling
from tensorflow.keras.models import load_model
# plotting
import matplotlib.pyplot as plt
from matplotlib.figure import Figure
from matplotlib.backends.backend_agg import FigureCanvas
from matplotlib.offsetbox import AnchoredText
import cartopy.crs as ccrs
import cartopy.feature as cfeature
import holoviews as hv
import hvplot.pandas
import hvplot.xarray
from bokeh.models.formatters import DatetimeTickFormatter
import panel as pn
pn.extension()
# utils
import urllib.request
from zipfile import ZipFile
import re
from tqdm.notebook import tqdm
import calendar
from pprint import pprint
import warnings
warnings.filterwarnings(action='ignore')
pd.options.display.max_columns = 10
hv.extension('bokeh', width=100)
Set project structure#
Let’s follow the structure of the IceNet paper as it is indicated in the source code config.py file. The structure allows conveniently using IceNet’s custom data loader.
# reload imports
%load_ext autoreload
%autoreload 2
# data folder
data_folder = os.getenv("HOME") + '/datahub/Reliance/sea-ice-fc'
notebook_folder = './polar-modelling-icenet'
config = {
'obs_data_folder': os.path.join(data_folder, 'obs'),
'mask_data_folder': os.path.join(data_folder, 'masks'),
'forecast_data_folder': os.path.join(data_folder, 'forecasts'),
'network_dataset_folder': os.path.join(data_folder, 'network_datasets'),
'dataloader_config_folder': os.path.join(data_folder, 'dataloader_configs'),
'network_h5_files_folder': os.path.join(data_folder, 'networks'),
'forecast_results_folder': os.path.join(data_folder, 'results'),
}
# Generate the folder structure through a list of comprehension
[os.makedirs(val) for key, val in config.items() if not os.path.exists(val)]
[]
Download input data and models#
IceNet consists of 25 ensemble members i.e. models. For this demonstrator, we only download three of them to reduce computational cost (note that this will reduce performance compared with the full ensemble). We also fetch analysis-ready i.e. preprocessed data of climate observations, ground truth sea ice concentration (SIC) and a IceNet’s project configuration file from a Zenodo repository. Finally, we call a script from the IceNet paper repo to generate masks required for computing metrics and visualisation.
Download pretrained IceNet models#
Let’s download 3 out of 25 ensemble members retrieved from the Polar Data Centre. The models are numbered from 36 to 60. For this example we use the networks 36, 42 and 53. It is worth to mention other pre-computed results from the Nature Communications paper can be downloaded including output results table, uncertainty, netCDF forecast of the 25 ensemble members, among others.
url = 'https://ramadda.data.bas.ac.uk/repository/entry/get/'
target_networks = [36, 42, 53]
for network in target_networks:
target_filename = os.path.join(config['network_h5_files_folder'],f'network_tempscaled_{network}.h5')
print("get pretrained IceNet model ", target_filename)
if not os.path.isfile(target_filename):
urllib.request.urlretrieve(url + f'network_tempscaled_{network}.h5?entryid=synth%3A71820e7d-c628-4e32-969f-464b7efb187c%3AL25ldXJhbF9uZXR3b3JrX21vZGVsL25ldHdvcmtfdGVtcHNjYWxlZF8zNi5oNQ%3D%3D',
target_filename)
get pretrained IceNet model /home/jovyan/datahub/Reliance/sea-ice-fc/networks/network_tempscaled_36.h5
get pretrained IceNet model /home/jovyan/datahub/Reliance/sea-ice-fc/networks/network_tempscaled_42.h5
get pretrained IceNet model /home/jovyan/datahub/Reliance/sea-ice-fc/networks/network_tempscaled_53.h5
Download ERA5 data (ECMWF Reanalysis)#
Raw ERA5 data from a zenodo repository#
Let’s first download raw data from a zenodo repository. Data can also be downloaded from the Copernicus Climate Data Service (registration needed but free of charge) so we can visualize the input data we will be using as input to IceNet.
filename = 'raw-dataset1.zip'
url = f'https://zenodo.org/record/7394895/files/{filename}?download=1'
if not os.path.isfile(config['obs_data_folder'] + '/raw-dataset1.zip') or os.path.getsize(config['obs_data_folder'] + '/raw-dataset1.zip') == 0:
urllib.request.urlretrieve(url, config['obs_data_folder'] + '/raw-dataset1.zip')
if not os.path.isdir(config['obs_data_folder'] + '/raw-dataset1'):
with ZipFile(config['obs_data_folder'] + '/raw-dataset1.zip', 'r') as zObject:
zObject.extractall(path=config['obs_data_folder'] )
Visualize ERA5 data#
dset = xr.open_dataset(config['obs_data_folder'] + '/raw-dataset1/tas_EASE.nc')
dset
<xarray.Dataset>
Dimensions: (time: 36, yc: 432, xc: 432)
Coordinates:
* time (time) datetime64[ns] 2019-01-01 ... 2021-1...
* yc (yc) float64 5.388e+06 ... -5.388e+06
* xc (xc) float64 -5.388e+06 ... 5.388e+06
Data variables:
t2m (time, yc, xc) float64 ...
lambert_azimuthal_equal_area int32 -2147483647
Attributes:
history: 2022-11-28 19:34:30 GMT by grib_to_netcdf-2.25.1: /opt/ecmw...
Conventions: CF-1.7minval=240
maxval=300
nrows = 2
ncols = 2
# Define the figure and each axis for the 1 rows and 3 columns
proj = ccrs.LambertAzimuthalEqualArea(central_longitude=0.0, central_latitude=90.0)
fig, axs = plt.subplots(nrows=nrows,ncols=ncols,
subplot_kw={'projection': proj},
figsize=(18, 18))
# axs is a 2 dimensional array of `GeoAxes`. We will flatten it into a 1-D array
axs=axs.flatten()
for sp in range(nrows*ncols):
axs[sp] = plt.subplot(nrows, ncols, sp + 1, projection=proj)
axs[sp].set_extent([-180, 180, 70, 90], ccrs.PlateCarree())
map = dset.sel(time='2019-' + "%02d" % (9 + sp,) + '-01').t2m.plot.pcolormesh(ax=axs[sp], x='xc', y='yc',
cmap='coolwarm',
vmin=minval, vmax=maxval,
add_colorbar=False)
axs[sp].coastlines('50m')
axs[sp].gridlines()
month = dset.sel(time='2019-' + "%02d" % (9 + sp,) + '-01').time.dt.strftime('%B %Y').values
axs[sp].set_title(month, fontsize=15)
# Title for both plots
fig.suptitle('2 meter Temperature\n' , fontsize=15)
cb_ax = fig.add_axes([0.325, 0.05, 0.4, 0.04])
cbar = plt.colorbar(map, cax=cb_ax, extend='both', orientation='horizontal', fraction=0.046, pad=0.04)
cbar.ax.tick_params(labelsize=15)
cbar.ax.set_ylabel('K', fontsize=15)
plt.savefig(os.getenv("HOME") + '/b2drop/Rohub/ros/dcc9affd-a2dd-4ecf-885d-a882d7e1702f/output/T2m_September-December2019.png')
Preprocessed ERA5 data from a zenodo repository#
Let’s download analysis-ready i.e. preprocessed ERA5 observations from a zenodo repository.
Note
The analysis-ready data were generated by running the script python3 icenet/preproc_icenet_data.py in step 3.2) Preprocess the raw data according to the icenet-paper repository. The scripts normalise the raw NetCDF data, downloaded using the bash file ./download_era5_data_in_parallel.sh (see the step 2) Download data), and saves it as monthly NumPy files.
filename = 'dataset1.zip'
url = f'https://zenodo.org/record/5516869/files/{filename}?download=1'
if not os.path.isfile(config['network_dataset_folder'] + '/dataset1.zip') or os.path.getsize(config['network_dataset_folder'] + '/dataset1.zip') == 0:
urllib.request.urlretrieve(url, config['network_dataset_folder'] + '/dataset1.zip')
if not os.path.isdir(config['network_dataset_folder'] + '/dataset1'):
with ZipFile(config['network_dataset_folder'] + '/dataset1.zip', 'r') as zObject:
zObject.extractall(path=config['network_dataset_folder'] )
Download ground truth SIC#
We additionally download analysis-ready i.e. ground truth SIC data from a zenodo repository.
Note
The analysis-ready ground truth SIC data were generated by running the script python3 icenet/download_sic_data.py in step 2) Download data according to the icenet-paper repository. The script downloads and concatenate OSI-SAF SIC data, OSI-450 (1979-2015) and OSI-430-b (2016-ownards), and saves it as monthly averages in a netCDF file.
filename = config['obs_data_folder'] + '/' + 'siconca_EASE.nc'
url = f'https://zenodo.org/record/5516869/files/siconca_EASE.nc?download=1'
if not os.path.isfile(filename) or os.path.getsize(filename) == 0:
urllib.request.urlretrieve(url, filename)
Download mask#
The script icenet/gen_masks.py generates masks for land, the polar holes, OSI-SAF monthly maximum ice extent (the active
grid cell region), and the Arctic regions & coastline. Figures of the
masks are saved in the ./figures folder.
if ( not os.path.isfile(config['mask_data_folder'] + '/land_mask.npy') or
not os.path.isfile(config['mask_data_folder'] + '/polarhole1_mask.npy') or
not os.path.isfile(config['mask_data_folder'] + '/active_grid_cell_mask_01.npy') ):
!python polar-modelling-icenet/icenet/gen_masks.py
Data loader#
The following lines show how to download and read a given IceNet’s configuration JSON file into a custom loader, IceNetDataLoader. The loader conveniently dictates which variables are input to the networks, which climate simulations are used for pre-training, and how far ahead to forecast.
dataloader_ID = '2021_09_03_1300_icenet_demo.json'
url = f'https://zenodo.org/record/5516869/files/{dataloader_ID}?download=1'
if not os.path.isfile(config['dataloader_config_folder'] + '/' + dataloader_ID) or os.path.getsize(config['dataloader_config_folder'] + '/' + dataloader_ID) == 0:
urllib.request.urlretrieve(url, config['dataloader_config_folder'] + '/' + dataloader_ID)
with open(config['dataloader_config_folder'] + '/' + dataloader_ID, 'r') as readfile:
dataloader_config = json.load(readfile)
pprint(dataloader_config['input_data'])
{'circmonth': {'include': True, 'metadata': True},
'land': {'include': True, 'metadata': True},
'psl': {'abs': {'include': False, 'max_lag': 3},
'anom': {'include': True, 'max_lag': 3}},
'rsds': {'abs': {'include': False, 'max_lag': 3},
'anom': {'include': True, 'max_lag': 3}},
'rsus': {'abs': {'include': False, 'max_lag': 3},
'anom': {'include': True, 'max_lag': 3}},
'siconca': {'abs': {'include': True, 'max_lag': 12},
'anom': {'include': False, 'max_lag': 3},
'linear_trend': {'include': True}},
'ta500': {'abs': {'include': False, 'max_lag': 3},
'anom': {'include': True, 'max_lag': 3}},
'tas': {'abs': {'include': False, 'max_lag': 3},
'anom': {'include': True, 'max_lag': 3}},
'tos': {'abs': {'include': False, 'max_lag': 3},
'anom': {'include': True, 'max_lag': 3}},
'ua10': {'abs': {'include': True, 'max_lag': 3},
'anom': {'include': False, 'max_lag': 3}},
'uas': {'abs': {'include': True, 'max_lag': 1},
'anom': {'include': False, 'max_lag': 1}},
'vas': {'abs': {'include': True, 'max_lag': 1},
'anom': {'include': False, 'max_lag': 1}},
'zg250': {'abs': {'include': False, 'max_lag': 3},
'anom': {'include': True, 'max_lag': 3}},
'zg500': {'abs': {'include': False, 'max_lag': 3},
'anom': {'include': True, 'max_lag': 3}}}
The input_data element of the IceNet’s JSON file lists input variables and corresponding settings. We use the same input data as in the Nature Communications paper which consists of SIC, 11 climate variables, statistical SIC forecasts, and metadata (see Supplementary Table 2). These layers are stacked in an identical manner to the RGB channels of a traditional image, amounting to 50 channels in total.
# Load dataloader
dataloader_config_fpath = os.path.join(config['dataloader_config_folder'], dataloader_ID)
# Data loader
print("\nSetting up the data loader with config file: {}\n\n".format(dataloader_ID))
dataloader = IceNetDataLoader(dataloader_config_fpath)
print('\n\nDone.\n')
Setting up the data loader with config file: 2021_09_03_1300_icenet_demo.json
Done.
Load networks#
Let’s also load the ensemble IceNet’s members using the load_model function imported from Keras API with Tensorflow backend.
network_regex = re.compile('^network_tempscaled_([0-9]*).h5$')
network_fpaths = [os.path.join(config['network_h5_files_folder'], f) for f in
sorted(os.listdir(config['network_h5_files_folder'])) if network_regex.match(f)]
ensemble_seeds = [network_regex.match(f)[1] for f in
sorted(os.listdir(config['network_h5_files_folder'])) if network_regex.match(f)]
networks = []
for network_fpath in network_fpaths:
print('Loading model from {}... '.format(network_fpath), end='', flush=True)
networks.append(load_model(network_fpath, compile=False))
print('Done.')
Loading model from /home/jovyan/datahub/Reliance/sea-ice-fc/networks/network_tempscaled_36.h5... Done.
Loading model from /home/jovyan/datahub/Reliance/sea-ice-fc/networks/network_tempscaled_42.h5... Done.
Loading model from /home/jovyan/datahub/Reliance/sea-ice-fc/networks/network_tempscaled_53.h5... Done.
Modelling#
Forecast settings#
Now let’s set the target model and forecast dates, start forecast_start (Jan 2020) and end forecast_end (Dec 2020). We also extract the number of forecast months from the IceNet’s custom dataloader.
model = 'IceNet'
forecast_start = pd.Timestamp('2020-01-01')
forecast_end = pd.Timestamp('2020-12-01')
n_forecast_months = dataloader.config['n_forecast_months']
print('\n# of forecast months: {}\n'.format(n_forecast_months))
# of forecast months: 6
Set up forecast folder#
forecast_folder = os.path.join(config['forecast_data_folder'], 'icenet', dataloader_ID, model)
print(forecast_folder)
if not os.path.exists(forecast_folder):
os.makedirs(forecast_folder)
/home/jovyan/datahub/Reliance/sea-ice-fc/forecasts/icenet/2021_09_03_1300_icenet_demo.json/IceNet
Load ground truth SIC#
print('Loading ground truth SIC... ', end='', flush=True)
true_sic_fpath = os.path.join(config['obs_data_folder'], 'siconca_EASE.nc')
true_sic_da = xr.open_dataarray(true_sic_fpath)
print('Done.')
Loading ground truth SIC... Done.
Set up forecast DataArray dictionary#
Now we are setting up an empty xarray DataArray object that we will use to store IceNet’s forecasts. DataArrays let you conveniently handle, query and visualise spatio-temporal data, such as the forecast predictions generated by the IceNet system.
# define list of lead times
leadtimes = np.arange(1, n_forecast_months+1)
# add ensemble to the list of models
ensemble_seeds_and_mean = ensemble_seeds.copy()
ensemble_seeds_and_mean.append('ensemble')
all_target_dates = pd.date_range(
start=forecast_start,
end=forecast_end,
freq='MS'
)
all_start_dates = pd.date_range(
start=forecast_start - pd.DateOffset(months=n_forecast_months-1),
end=forecast_end,
freq='MS'
)
shape = (len(all_target_dates),
*dataloader.config['raw_data_shape'],
n_forecast_months)
coords = {
'time': all_target_dates, # To be sliced to target dates
'yc': true_sic_da.coords['yc'],
'xc': true_sic_da.coords['xc'],
'lon': true_sic_da.isel(time=0).coords['lon'],
'lat': true_sic_da.isel(time=0).coords['lat'],
'leadtime': leadtimes,
'seed': ensemble_seeds_and_mean,
'ice_class': ['no_ice', 'marginal_ice', 'full_ice']
}
# Probabilistic SIC class forecasts
dims = ('seed', 'time', 'yc', 'xc', 'leadtime', 'ice_class')
shape = (len(ensemble_seeds_and_mean), *shape, 3)
model_forecast = xr.DataArray(
data=np.zeros(shape, dtype=np.float32),
coords=coords,
dims=dims
)
Build up forecasts#
In this step, we generate IceNet’s forecast for the target period and write it into the empty DataArrays object. IceNet’s outputs are forecasts of three sea ice concentration (SIC) classes: open-water (SIC ≤ 15%), marginal ice (15% < SIC < 80%) and full ice (SIC ≥ 80%) for the following 6 months in the form of discrete probability distributions at each grid cell.
%%time
forecast_fpath = os.path.join(forecast_folder, f'{model.lower()}_forecasts.nc'.format(model.lower()))
if not os.path.isfile(forecast_fpath):
for start_date in tqdm(all_start_dates):
# Target forecast dates for the forecast beginning at this `start_date`
target_dates = pd.date_range(
start=start_date,
end=start_date + pd.DateOffset(months=n_forecast_months-1),
freq='MS'
)
X, y, sample_weights = dataloader.data_generation([start_date])
mask = sample_weights > 0
pred = np.array([network.predict(X)[0] for network in networks])
pred *= mask # mask outside active grid cell region to zero
# concat ensemble mean to the set of network predictions
ensemble_mean_pred = pred.mean(axis=0, keepdims=True)
pred = np.concatenate([pred, ensemble_mean_pred], axis=0)
for i, (target_date, leadtime) in enumerate(zip(target_dates, leadtimes)):
if target_date in all_target_dates:
model_forecast.\
loc[:, target_date, :, :, leadtime] = pred[..., i]
print('Saving forecast NetCDF for {}... '.format(model), end='', flush=True)
model_forecast.to_netcdf(forecast_fpath) #export file as Net
print('Done.')
Done.
CPU times: user 995 µs, sys: 193 µs, total: 1.19 ms
Wall time: 5.1 ms
Results#
Settings#
The IceNet codebase allows computing operations in the memory or with dask. The computation in dask is optimal for predicting longer target periods (see further info in icenet/analyse_heldout_predictions.py). The following lines show how to compute in the memory.
Setup#
metric_compute_list = ['Binary accuracy', 'SIE error']
forecast_fpath = os.path.join(forecast_folder, f'{model.lower()}_forecasts.nc'.format(model.lower()))
chunks = {'seed': 1}
icenet_forecast_da = xr.open_dataarray(forecast_fpath, chunks=chunks)
icenet_seeds = icenet_forecast_da.seed.values
Monthly masks (active grid cell regions to compute metrics over)#
mask_fpath_format = os.path.join(config['mask_data_folder'], 'active_grid_cell_mask_{}.npy')
month_mask_da = xr.DataArray(np.array(
[np.load(mask_fpath_format.format('{:02d}'.format(month))) for
month in np.arange(1, 12+1)],
))
Download previous results#
url = 'https://ramadda.data.bas.ac.uk/repository/entry/get/'
fn = '2021_07_01_183913_forecast_results.csv'
fn_suffix = '?entryid=synth%3A71820e7d-c628-4e32-969f-464b7efb187c%3AL3Jlc3VsdHMvZm9yZWNhc3RfcmVzdWx0cy8yMDIxXzA3XzAxXzE4MzkxM19mb3JlY2FzdF9yZXN1bHRzLmNzdg%3D%3D'
if not os.path.isfile(os.path.join(config['forecast_results_folder'],fn)):
urllib.request.urlretrieve(url + fn + fn_suffix, os.path.join(config['forecast_results_folder'],fn))
Initialise results dataframe#
Now we write forecast results over a old results file generated for IceNet’s nature communications paper. The old results file contains the performance of all 25 ensemble models, ECMWF SEAS5 physics-based sea ice probability forecast and linear trend benchmark. For the purposes of this demonstrator, we remove the IceNet’s ensemble records to replace with the performance of 3 assessed ensemble models.
now = pd.Timestamp.now()
new_results_df_fname = now.strftime('%Y_%m_%d_%H%M%S_forecast_results.csv')
new_results_df_fpath = os.path.join(config['forecast_results_folder'], new_results_df_fname)
print('New results will be saved to {}\n\n'.format(new_results_df_fpath))
results_df_fnames = sorted([f for f in os.listdir(config['forecast_results_folder']) if re.compile('.*.csv').match(f)])
if len(results_df_fnames) >= 1:
old_results_df_fname = results_df_fnames[-1]
old_results_df_fpath = os.path.join(config['forecast_results_folder'], old_results_df_fname)
print('\n\nLoading previous results dataset from {}'.format(old_results_df_fpath))
# Load previous results, do not interpret 'NA' as NaN
results_df = pd.read_csv(old_results_df_fpath, keep_default_na=False, comment='#')
# Remove existing IceNet results
results_df = results_df[~results_df['Model'].str.startswith('IceNet')]
# Drop spurious index column if present
results_df = results_df.drop('Unnamed: 0', axis=1, errors='ignore')
results_df['Forecast date'] = [pd.Timestamp(date) for date in results_df['Forecast date']]
results_df = results_df.set_index(['Model', 'Ensemble member', 'Leadtime', 'Forecast date'])
# Add new models to the dataframe
multi_index = create_results_dataset_index([model], leadtimes, all_target_dates, model, icenet_seeds)
results_df = results_df.append(pd.DataFrame(index=multi_index)).sort_index()
New results will be saved to /home/jovyan/datahub/Reliance/sea-ice-fc/results/2022_12_05_075638_forecast_results.csv
Loading previous results dataset from /home/jovyan/datahub/Reliance/sea-ice-fc/results/2022_12_04_164514_forecast_results.csv
Compute IceNet SIC#
We obtain the sea ice probability (SIC>15%) for each ensemble member and ensemble mean by summing IceNet’s marginal ice (15%<SIC<80%) and full ice class (SIC>80%) probabilities.
icenet_sip_da = icenet_forecast_da.sel(ice_class=['marginal_ice', 'full_ice']).sum('ice_class')
Ground truth SIC#
Let’s also load ground truth SIC which was already preprocessed and generated from .
true_sic_fpath = os.path.join(config['obs_data_folder'], 'siconca_EASE.nc')
true_sic_da = xr.open_dataarray(true_sic_fpath, chunks={})
true_sic_da = true_sic_da.load()
true_sic_da = true_sic_da.sel(time=all_target_dates)
if 'Binary accuracy' in metric_compute_list:
binary_true_da = true_sic_da > 0.15
Monthwise masks#
As we are showing in the next section, the monthly masks, stacked into a DataArrays object, are relevant to compute metrics only in the active grid cell region.
months = [pd.Timestamp(date).month - 1 for date in all_target_dates]
mask_da = xr.DataArray(
[month_mask_da[month] for month in months],
dims=('time', 'yc', 'xc'),
coords={
'time': true_sic_da.time.values,
'yc': true_sic_da.yc.values,
'xc': true_sic_da.xc.values,
}
)
Compute performance metrics#
To analyse the forecast performance, IceNet’s researchers compute two metrics, Binary accuracy and Sea Ice Extent (SIE) error. The former is generated over an active grid cell region for a given calendar month and can be seen as a normalised version of the integrated ice edge error (IIEE) (see further information of the meaning in Methods in the IceNet’s Nature communications paper. The latter, SIE error, is the difference between the overpredicted area and the underpredicted area. Both metrics are complementary, being the binary accuracy more robust for assessing IceNet’s relative seasonal forecast skill for September.
print('Analysing forecasts: \n\n')
print('Computing metrics:')
print(metric_compute_list)
binary_forecast_da = icenet_sip_da > 0.5
compute_ds = xr.Dataset()
for metric in metric_compute_list:
if metric == 'Binary accuracy':
binary_correct_da = (binary_forecast_da == binary_true_da).astype(np.float32)
binary_correct_weighted_da = binary_correct_da.weighted(mask_da)
# Mean percentage of correct classifications over the active
# grid cell area
ds_binacc = (binary_correct_weighted_da.mean(dim=['yc', 'xc']) * 100)
compute_ds[metric] = ds_binacc
elif metric == 'SIE error':
binary_forecast_weighted_da = binary_forecast_da.astype(int).weighted(mask_da)
binary_true_weighted_da = binary_true_da.astype(int).weighted(mask_da)
ds_sie_error = (
binary_forecast_weighted_da.sum(['xc', 'yc']) -
binary_true_weighted_da.sum(['xc', 'yc'])
) * 25**2
compute_ds[metric] = ds_sie_error
print('Writing to results dataset...')
for compute_da in iter(compute_ds.data_vars.values()):
metric = compute_da.name
compute_df_index = results_df.loc[
pd.IndexSlice[model, :, leadtimes, all_target_dates], metric].\
droplevel(0).index
# Ensure indexes are aligned for assigning to results_df
compute_df = compute_da.to_dataframe().reset_index().\
set_index(['seed', 'leadtime', 'time']).\
reindex(index=compute_df_index)
results_df.loc[pd.IndexSlice[model, :, leadtimes, all_target_dates], metric] = \
compute_df.values
print('\nCheckpointing results dataset... ', end='', flush=True)
results_df.to_csv(new_results_df_fpath)
print('Done.')
Analysing forecasts:
Computing metrics:
['Binary accuracy', 'SIE error']
Writing to results dataset...
Checkpointing results dataset... Done.
Analysis#
In this section, we explore the forecast results and provide some interpretation. Note we use a small sample of the data so the results are only for demonstration purposes.
Plot settings#
settings_lineplots = dict(padding=0.1, height=400, width=700, fontsize={'title': '120%','labels': '120%', 'ticks': '100%'})
Preprocess results dataset#
# Reset index to preprocess results dataset
results_df = results_df.reset_index()
results_df['Forecast date'] = pd.to_datetime(results_df['Forecast date'])
month_names = np.array(['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun',
'Jul', 'Aug', 'Sept', 'Oct', 'Nov', 'Dec'])
forecast_month_names = month_names[results_df['Forecast date'].dt.month.values - 1]
results_df['Calendar month'] = forecast_month_names
results_df = results_df.set_index(['Model', 'Ensemble member', 'Leadtime', 'Forecast date'])
# subset target period
results_df = results_df.loc(axis=0)[pd.IndexSlice[:, :, :, slice(forecast_start, forecast_end)]]
results_df = results_df.sort_index()
Let’s inspect the results pandas data.frame reporting the monthly performance of each ensemble member for the target period.
results_df.head()
| Binary accuracy | SIE error | Calendar month | ||||
|---|---|---|---|---|---|---|
| Model | Ensemble member | Leadtime | Forecast date | |||
| IceNet | 36 | 1 | 2020-01-01 | 95.697038 | -457500.0 | Jan |
| 2020-02-01 | 97.435745 | -180000.0 | Feb | |||
| 2020-03-01 | 97.507057 | -215625.0 | Mar | |||
| 2020-04-01 | 96.977625 | -39375.0 | Apr | |||
| 2020-05-01 | 97.439646 | -13750.0 | May |
Ice edge#
The following figure shows a method to interactively plotting how IceNet updates its forecasts using new initial conditions as the lead time decreases, with the predicted ice edge approaching the true ice edge. The observed ice edge (in black) is defined as the sea ice concentration (SIC)=15% contour. IceNet’s predicted ice edge (in green) is determined from its sea ice probability forecast as the P(SIC>15%)=0.5 contour.
The dashboard (sliders + figure) is generated through the panel library, an open-source Python library that lets you create custom interactive web apps and dashboards. In the settings below, we define two sliders which essentially allow us to interact with two variables, the month and lead time.
# set target year
year = 2020
# set sliders
month_name = [f'{calendar.month_name[m]} {year}' for m in list(range(1, 13))]
month_slider = pn.widgets.DiscreteSlider(name="Month", options=month_name, value='September 2020', width=200)
lead_slider = pn.widgets.IntSlider(name="Lead time (months)", start=1, end=4, step=1, value=4, direction='rtl', width=200)
Important
The interactive figure below essentially reproduces Figure 2 of the IceNet paper, however it covers a larger geographical extent i.e. in March when the ice edge extent is largest. Also, we visualise each month of the target period of this demonstrator (January to December 2020). Some script snippets were extracted from the IceNet script python3 icenet/plot_paper_figures.py (see line 182). Note we define alpha and colours for coastline and land mask object. These configurations allow overlapping these layers correctly to differentiate IceNet predictions and SIC ground truth.
## set boundaries
mask = np.load(os.path.join(config['mask_data_folder'],
'active_grid_cell_mask_{}.npy'.format('03')))
# define new region of interest
left = 124; right = 328; bot = 300; top = 126
## land and region masks
land_mask = np.load(os.path.join(config['mask_data_folder'], 'land_mask.npy'))
region_mask = np.load(os.path.join(config['mask_data_folder'], 'region_mask.npy'))
## define coastline and land layers
arr = region_mask == 13
coastline_rgba_arr = np.zeros((*arr.shape, 4))
coastline_rgba_arr[:, :, 3] = arr # alpha channel
coastline_rgba_arr[:, :, :3] = .5 # black coastline
land_mask_rgba_arr = np.zeros((*arr.shape, 4))
land_mask_rgba_arr[:, :, 3] = land_mask # alpha channel
land_mask_rgba_arr[:, :, :3] = .5 # gray land
## line colours
pred_ice_edge_rgb = 'green'
true_ice_edge_rgb = 'magenta'
## define plot function
@pn.depends(month_slider.param.value, lead_slider.param.value)
def plot_forecast(month, leadtime):
tdate = pd.Timestamp(year,month_name.index(month)+1,1)
fig0 = Figure(figsize=(8, 8))
ax0 = fig0.subplots()
FigureCanvas(fig0) # not needed for mpl >= 3.1
ax0.imshow(coastline_rgba_arr[top:bot, left:right, :], zorder=20)
t = tdate - pd.DateOffset(months=leadtime)
t2m = dset.sel(time=t).t2m.values[top:bot, left:right]
ax0.imshow(t2m, cmap='coolwarm', zorder=1)
icenet_sip = icenet_sip_da.sel(time=tdate, leadtime=leadtime, seed='ensemble').data
ax0.contour(
icenet_sip[top:bot, left:right],
levels=[0.5],
colors=[pred_ice_edge_rgb],
zorder=1,
linewidths=2.5,
)
groundtruth_sic = true_sic_da.sel(time=tdate)
gt_img = (groundtruth_sic>0.15).data
ax0.contour(
gt_img[top:bot, left:right],
levels=[0.5],
colors=[true_ice_edge_rgb],
zorder=1,
linewidths=2.5
)
ax0.tick_params(which='both', bottom=False, left=False, labelbottom=False, labelleft=False)
proxy = [plt.Line2D([0], [1], color=true_ice_edge_rgb),
plt.Line2D([0], [1], color=pred_ice_edge_rgb)]
ax0.legend(proxy, ['Observed', 'Predicted'],
loc='upper left', fontsize=11)
ax0.set_title(f'Date = {month} & Lead time = {leadtime} months ({t.strftime("%B %Y")})')
acc = results_df.loc['IceNet', 'ensemble', leadtime, tdate]['Binary accuracy']
sie_err = results_df.loc['IceNet', 'ensemble', leadtime, tdate]['SIE error']
Afont = {
'backgroundcolor': 'lightgray',
'color': 'black',
'weight': 'normal',
'size': 11,
}
t = AnchoredText('Binary acc: {:.1f}% \nSIE error: {:+.3f} mil km$^2$'.format(acc,sie_err/1e6), prop=Afont, loc='lower right', pad=0.5, borderpad=0.4, frameon=False)
t = ax0.add_artist(t)
t.zorder = 21
return pn.pane.Matplotlib(fig0, tight=True, dpi=150)
plot_ie = pn.Row(
plot_forecast,
pn.Column(pn.Spacer(height=5), month_slider, pn.Spacer(height=15), lead_slider, background='#f0f0f0', sizing_mode="fixed"),
width_policy='max', height_policy='max',
)
plot_ie.embed()